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Smart Video Surveillance for Proactive Security 


he attack took just a 
few minutes. Times 
Square was aglow in 
the morning darkness 
but nearly deserted as a 
shadowy figure on a bicycle pedaled in 
and planted a small bomb that shattered 
the glass facade of the military recruiting 
station on Broadway just north of 43rd 
Street” [3]. This attack took place at 3:43 
a.m on 6 March 2008. While this particu- 
lar incident did not cause much damage, 
the effort to prevent such incidents and 
rapidly investigate and apprehend the 
perpetrators remains a top priority for 
homeland security and police depart- 
ment officials across the globe. 

This article presents an overview of 
smart video surveillance technologies 
and their application to incidents like 
this one. The technology and system 
described in this article originated as a 
research effort within IBM's T.J Watson 
Research Center and has since matured 
into a fully supported offering from 
IBM’s Global Technology Services arm 
called the IBM Smart Surveillance 
Solution. The solution is currently 
deployed at multiple customer loca- 
tions worldwide. 


a 


SMART SURVEILLANCE SYSTEMS 

A typical surveillance system involves 
large numbers of cameras deployed in 
the streets with a network that aggre- 
gates the camera feeds to a command 
center. The feeds may be monitored in 
real time at the command center and 
archived for investigative purposes. Such 
networked systems provide “situational 
awareness over vast urban areas.” 
However, they leave the entire burden of 
watching video, detecting threats, and 
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locating suspects to the human operator. 
This process of manually watching video 
is known to be tedious, ineffective [1], 
and expensive. 

Intelligent (smart) surveillance sys- 
tems, which are now “watching the 
video” and providing alerts and content- 
based search capabilities, make the video 
monitoring and investigation process 
scalable and effective. The software algo- 
rithms that analyze the video and pro- 
vide alerts are commonly referred to as 
video analytics. These are responsible for 
turning video cameras from a mere data 
gathering tool into smart surveillance 
systems for proactive security. Smart 
surveillance systems have been enabled 
by the advances in computer vision, 
video analysis, pattern recognition and 
multimedia indexing technologies over 
the past decade. 

A high-level representation of a smart 
surveillance system is illustrated in 
Figure 1, where cameras on the street 
are connected back to a video encoder 
that converts the video into an IP stream 
(no encoders are needed in case of IP 
cameras). These streams are aggregated 
and stored onto a server by a video man- 
agement system that manages the cam- 
eras, video archive, and monitors. The 
video analytics components of the sys- 
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tem run on the server and provide two 

types of functions to the people in the 

command center. 
= Real-time threat alerts: These 
alerts are generated when a user 
defined event occurs within a cam- 
era field of view. For example, the 
system can alert if a package is left 
static within a camera field of view 
for more that a specified time (for 
instance, for 2 min). 
= Rapid video search: The system 
allows the user to search through mul- 
tiple cameras for specific types of 
objects or events. For example an inves- 
tigator may look for all red cars that 
passed under a camera over the course 
of the “last ten days” or for people rid- 
ing bikes in the area of Time Square 
(many cameras) over the last week. 


SMART SURVEILLANCE 
TECHNOLOGIES 

The technologies that go into making 
surveillance system “smart, useable, and 
scalable” draw on many fields of science 
and engineering. While the technologies 
today provide significant value, several 
of the underlying challenges are open 
scientific problems and will continue to 
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[FIG1] Video encoders capture video and servers store the video. Smart surveillance 
software analyzes the video to create real-time alerts and a searchable video index. 


IEEE SIGNAL PROCESSING MAGAZINE [136] JULY 2008 


1053-5888/08/$25.00©2008IEEE 


Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on November 7, 2008 at 03:19 from IEEE Xplore. Restrictions apply. 


| in the SPOTLIGHT | continued from page 136 


evolve over the next few decades, promis- 
ing to provide better tools to homeland 
security and law enforcement officials in 
the future. 

A number of distinct and highly spe- 
cialized techniques and algorithms are 
used in a typical smart surveillance sys- 
tem [2], which can be summarized as 
follows: 

= Plug-and-play analytics frame- 

works: Video cameras capture a wide 

range of information about people, 
vehicles, and events. The type of infor- 

mation captured is dependent on a 

number of parameters like camera 

type, angle, field of view, and resolu- 
tion. Automatically detecting each 
type of information requires special- 
ized sets of algorithms. For example, 
automatically reading license plates 
requires specialized optical character 
reader (OCR) algorithms; capturing 
face images requires face detection 
algorithms and recognizing behaviors; 
finding abandoned packages requires 


detection and tracking algorithms. A 
smart surveillance system needs to 
support all of these algorithms, typi- 
cally through a plug-and-play frame- 
work [5]. 

= Object detection and tracking: One 
of the core capabilities of smart surveil- 
lance systems is the ability to detect 
and track moving objects as illustrated 
in Figure 2. Object detection algo- 
rithms are typically statistical learning 
algorithms that dynamically learn the 
scene background model and use the 
reference model to determine which 
parts of the scene correspond to mov- 
ing objects [2]. Tracking algorithms 
associate the movement of objects over 
time generating a trajectory. These two 
algorithms together take a video 
stream and decompose it into objects 
and events, effectively creating a parse 
tree for the surveillance video. 

= Object and color classification: 
Object classification algorithms classi- 
fy objects into different classes, for 


[FIG2] (a) The home page, with 1) cameras bottom right running a variety of video 


analysis capabil 


ies, like license plate recognition, face capture, and behavior analysis; 


2) real-time alert panel, top right; 3) map of the area, top left; and 4) video player, 
bottom left. (b) The results of searching for a red car. (c) A summary view of all of the 
activity in camera over a selected period represented as object tracks. 


example, People, Vehicles, Animals, 
and use training data and calibration 
schemes. Color classification classifies 
the dominant color of the object into 
one of the standard colors (red, green, 
blue, yellow, black, and white). These 
attributes become part of the search- 
able index thus allowing a user to 
query for “red vehicles” or “blue 
(clothes) people.” 

= Alert definition and detection: 
Typical smart surveillance systems. 
support a variety of user-defined 
behavior detection capabilities such as 
detecting motion within a defined 
zone, detecting objects that cross a 
user defined virtual boundary, and 
detecting objects that are abandoned. 
Graphical user interface (GUI) tools 
are used to define zones of interest, 
object sizes, and other parameters 
needed to define the behavior. When 
the behavior of interest occurs within 
a camera field of view, the system 
automatically generates an alert mes- 
sage that can be transmitted to a 
workstation, PDA, or e-mail reader, 
depending on the user’s preference. 

= Database event indexing: The 
events detected by the video analysis 
algorithms are indexed by content and 
stored in a database. This allows 
events to be cross-referenced across 
multiple spatially distributed cameras 
and creates a historical archive of 
events. The event index information 
typically includes time of occurrence, 
camera identifier, event type, object 
type, object appearance attributes, and 
an index into the video repository 
which allows the user to “play back 
the relevant video at the touch of a 
button.” 

= Search and retrieval: Users can use 
a variety of GUI tools to define com- 
plex search criteria to retrieve specific 
events. Events are typically presented 
as shown in Figure 2(b). Search crite- 
ria include, object size, color, location 
in the scene, velocity, time of occur- 
rence, and several other parameters. 
The results of a search can also be 
rendered in a variety of summary 
views, one of which (called track sum- 
mary) is shown in Figure 2(c). 
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SMART SURVEILLANCE EXAMPLE 

The investigative process for a typical 
crime scene like the Times Square inci- 
dent involves many steps including gath- 
ering forensic evidence, witness 
interviews, and photographic evidence 
gathering. From a video camera perspec- 
tive, the main steps of the process consist 
of the camera canvas, the video footage 
acquisition, and the video investigation. 

During camera canvas, the police 
look for both public and private surveil- 
lance cameras in the area of the incident. 
Video footage from relevant cameras is 
gathered by the police for further investi- 
gation. The volume of this footage typi- 
cally runs in to tens of hours and can 
reach hundreds to thousands of hours 
based on the scope of the investigation. 
The video aggregated from multiple 
sources is organized and viewed by inves- 
tigators until they find the clips that pro- 
vide useful evidence. 

Typically, just the video camera part 
of the investigation can involve tens to 
hundreds of officers for gathering the 
video and many hundreds of man hours 
of video investigations. The labor 
involved in a video-based investigation 
sometimes becomes so prohibitive that 
the agencies may not be able to leverage 
the information that is buried in the 
hundreds of hours of video. Instead, 
smart surveillance systems can be 
(could have been) employed in cases 
such as the Times Square incident to 
provide real-time alerts and aid the 
investigation. 


REAL-TIME ALERTS 

IN TIMES SQUARE 

Alerts can be set up to automatically 
detect behaviors, like abandoned bags or 
people riding bikes. In terms of sched- 
ule, the alerts can be configured to be 
operational only during certain times of 
the day or night. A video surveillance 
system configured this way could have 
detected the bag that was dropped off by 
the cyclist, provided that there was cam- 
era coverage of the area with sufficient 
resolution on the bag. However, even if 
the system had provided an alert when 
the bag was dropped off, the response 
process to act on such an alert and the 


time required to respond would have 
been too great to prevent the incident 
from happening. 

When designing a system to provide 
automatic real-time alerts in an area as 
busy as Times Square, the configuration 
of the camera coverage and analytics, 
the handling of false alarms, and the 
response process needs to be consid- 
ered. The camera coverage needs to be 
designed to take into account the vul- 
nerabilities in the area and provide ade- 
quate resolution to enable the 
automatic detection. Once the cameras 
are in place, the alerts would need to be 
set up and configured appropriately for 
different threat scenarios. Given the 
level of activity in the Times Square 
area, it is reasonable to expect that 
there would be a number of incidents of 
abandoned bags, which, upon investiga- 
tion, turn out to be benign. Additional 
false alarms may result from errors in 
the automatic detection. The response 
process and policy for dealing with 
alerts from such systems would need to 
be developed by the police and other 
public safety agencies. 


INVESTIGATING THE 

TIMES SQUARE INCIDENT 

If we assume that all the existing cameras 

in Times Square and the neighboring 

areas (for instance a total of 100 cameras) 

were connected to a smart surveillance 

system, the system could have provided 

several functionalities: 
= Rapid location of footage of the 
incident: Upon an incident occur- 
rence, investigative agencies could 
use the search capability to locate 
all events based on a variety of 
search criteria such as speed of 
object, size, and location in scene. 
It is reasonable to expect that using 
such a centralized search capability 
would allow locating the clip of the 
incident within minutes of the 
occurrence of the event as opposed 
to the many hours it takes with a 
manual process. 
= Quick tracking of the perpetrator 
across cameras: Once the investiga- 
tors had come to the conclusion 
that the perpetrator was riding a 


bike and heading south-east, they 
could easily use the cross camera 
searching capability of the system to 
quickly track the movement of the 
suspect from one camera. Doing so 
would have allowed the investigators 
to quickly focus their efforts on cer- 
tain areas of the city. 

= Investigation of scouting activities: 
Typically, such incidents are planned 
and involve some scouting of the 
scene by the perpetrators prior to the 
incident. Using the search capabilities 
of the system, investigators could 
quickly look through events of inter- 
est like biking over the past weeks and 
gather more information for the 
investigation. 


PROCESS, PRIVACY AND POLICY 
While technologies like networked 
video surveillance, smart surveillance, 
IP cameras, and high-density storage 
continue to improve the tools for sur- 
veillance, there are a number of 
processes and privacy and policy issues 
that are required for the success of 
operational security systems. Currently 
most of our security agencies are 
geared largely to respond to events and 
use video surveillance in a reactive 
monitoring process. Technologies like 
smart surveillance begin to enable a 
proactive monitoring process. 

Deploying a smart surveillance sys- 
tem introduces a new stream of video- 
based alarms into the command center. 
To ensure successful deployment, the 
customers and technology providers 
should jointly address key issues such 
as training of operators to use sophisti- 
cated technologies and the process 
needed to evaluate an alarm condition 
to determine the response to alarm 
conditions. The system must be 
designed, configured, and tuned to 
minimize the impact of false alarms. 

As the technology to monitor areas 
for purposes of law enforcement and 
homeland security evolves, such tech- 
nologies typically raise the issues of 
citizen privacy in public spaces. These 
challenges can be addressed both at 
the technology and policy levels. 
Citizen privacy can be protected by 
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enabling privacy-preserving technolo- 
gies in the surveillance systems [4]. 
The technical enablers of privacy then 
have to be put into practice by formu- 
lating, implementing, and enforcing 
appropriate policies that govern the 
use of such systems. 


TOWARDS PROACTIVE SECURITY 

Preventing incidents like the Times 
Square bombing requires a huge effort 
that involves multiple methods of intel- 
ligence gathering and analysis. Tools 
such as the smart surveillance system 
bring in a new dimension of information 
into the security analysis and intelli- 
gence gathering process and can prove 


to be extremely valuable. The capability 
to search through large amounts video 
data, correlate events across cameras, 
and correlate video events to other infor- 
mation becomes the basis for moving 
security from a reactive paradigm, to a 
rapid investigation paradigm and even- 
tually a proactive paradigm. This evolu- 
tion to proactive security will necessarily 
involve joint technology, process and 
policy evolution. 
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In 1913, the Proceedings journal covered numerous key events: 


a Edwin H. Armstrong, the “father of FM radio,” patented his 
regenerative receiver, making possible long-range radio reception 


a William David Coolidge invented the modern X-ray tube, making 
possible safe and convenient diagnostic X-rays 


m AT&T began installing Lee De Forest's Audion, the first triode electron 
tube, in networks to boost voice signals as they crossed the United States 


= The first issue of Proceedings of the IRE began to chronicle these events 
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